energy cost
A faster way to estimate AI power consumption
Due to the explosive growth of artificial intelligence, it is estimated that data centers will consume up to 12 percent of total U.S. electricity by 2028, according to the Lawrence Berkeley National Laboratory. Improving data center energy efficiency is one way scientists are striving to make AI more sustainable. Toward that goal, researchers from MIT and the MIT-IBM Watson AI Lab developed a rapid prediction tool that tells data center operators how much power will be consumed by running a particular AI workload on a certain processor or AI accelerator chip. Their method produces reliable power estimates in a few seconds, unlike traditional modeling techniques that can take hours or even days to yield results. Moreover, their prediction tool can be applied to a wide range of hardware configurations -- even emerging designs that haven't been deployed yet.
AI could put people off tech jobs and hurt the economy, warns Raspberry Pi boss
The founder of British computer maker Raspberry Pi has warned that overestimating the abilities of Artificial Intelligence (AI) could put people off pursuing tech jobs and hurt the economy. Eben Upton told the BBC's Big Boss Interview podcast this could distort people's choices in ways that make that skill shortage worse and not better. Some people are very inclined to overestimate what these [AI] tools can do, he said, and warned against claims that it would destroy vast numbers of computing roles over the coming years. The rise of tools such as ChatGPT and Claude have led to predictions of huge job losses, particularly for tech workers and graduates. Amazon, Meta and Microsoft have already blamed tens of thousands of layoffs on AI over the last year.
Joint Energy Management and Coordinated AIGC Workload Scheduling for Distributed Data Centers: A Diffusion-Aided Reward Shaping Approach
Fu, Yang, Qin, Peng, Chen, Liming, Zhang, Zihao, Yu, Hao, Wang, Yifei
Artificial intelligence-generated content (AIGC) has emerged as a transformative paradigm for automating the creation of diverse and customized content, giving rise to rapidly growing computational workloads in cloud data centers. It is imperative for AIGC service providers (ASPs) to strategically schedule AIGC workloads to reduce data center energy costs while guaranteeing high-quality content generation. However, the distinctive characteristics of AIGC services pose critical challenges, including model heterogeneity across ASPs, implicit service quality evaluation, and complex inference process control. To tackle these challenges, we propose a joint energy management and coordinated AIGC workload scheduling framework, which introduces an explicit mathematical characterization of service quality to promote both job transfer among ASPs and fine-grained inference process configuration. Moreover, various energy resources within data centers are jointly considered to enhance power usage flexibility. Subsequently, a system utility maximization problem is formulated to balance AIGC service revenue with operational penalties and costs. Nevertheless, the strong coupling among job scheduling decisions induces severe reward sparsity, which limits the effectiveness of existing deep reinforcement learning (DRL) algorithms. To address this issue, we develop a diffusion model-aided reward shaping approach to synthesize complementary reward signals through a multi-step denoising process. This approach is seamlessly integrated with DRL to enable efficient learning of scheduling policies under sparse environmental feedback. Experiments based on real-world models and datasets demonstrate that our scheme effectively accommodates electricity price fluctuations and AIGC model heterogeneity, while achieving superior learning convergence and system utility compared with benchmark methods.
ELANA: A Simple Energy and Latency Analyzer for LLMs
Chiang, Hung-Yueh, Wang, Bokun, Marculescu, Diana
The latency and power consumption of large language models (LLMs) are major constraints when serving them across a wide spectrum of hardware platforms, from mobile edge devices to cloud GPU clusters. Benchmarking is crucial for optimizing efficiency in both model deployment and next-generation model development. To address this need, we open-source a simple profiling tool, \textbf{ELANA}, for evaluating LLMs. ELANA is designed as a lightweight, academic-friendly profiler for analyzing model size, key-value (KV) cache size, prefilling latency (Time-to-first-token, TTFT), generation latency (Time-per-output-token, TPOT), and end-to-end latency (Time-to-last-token, TTLT) of LLMs on both multi-GPU and edge GPU platforms. It supports all publicly available models on Hugging Face and offers a simple command-line interface, along with optional energy consumption logging. Moreover, ELANA is fully compatible with popular Hugging Face APIs and can be easily customized or adapted to compressed or low bit-width models, making it ideal for research on efficient LLMs or for small-scale proof-of-concept studies. We release the ELANA profiling tool at: https://github.com/enyac-group/Elana.
TokenPowerBench: Benchmarking the Power Consumption of LLM Inference
Niu, Chenxu, Zhang, Wei, Li, Jie, Zhao, Yongjian, Wang, Tongyang, Wang, Xi, Chen, Yong
Large language model (LLM) services now answer billions of queries per day, and industry reports show that inference, not training, accounts for more than 90% of total power consumption. However, existing benchmarks focus on either training/fine-tuning or performance of inference and provide little support for power consumption measurement and analysis of inference. We introduce TokenPowerBench, the first lightweight and extensible benchmark designed for LLM-inference power consumption studies. The benchmark combines (i) a declarative configuration interface covering model choice, prompt set, and inference engine, (ii) a measurement layer that captures GPU-, node-, and system-level power without specialized power meters, and (iii) a phase-aligned metrics pipeline that attributes energy to the prefill and decode stages of every request. These elements make it straight-forward to explore the power consumed by an LLM inference run; furthermore, by varying batch size, context length, parallelism strategy and quantization, users can quickly assess how each setting affects joules per token and other energy-efficiency metrics. We evaluate TokenPowerBench on four of the most widely used model series (Llama, Falcon, Qwen, and Mistral). Our experiments cover from 1 billion parameters up to the frontier-scale Llama3-405B model. Furthermore, we release TokenPowerBench as open source to help users to measure power consumption, forecast operating expenses, and meet sustainability targets when deploying LLM services.